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Abstract 

We present a model for the optimal design of an online auction/store by 
a seller. The framework we use is a stochastic optimal control problem. In 
our setting, the seller wishes to maximize her average wealth level, where 
she can control her price per unit via her reputation level. The correspond- 
ing Hamilton-Jacobi-Bellmann equation is analyzed for an introductory case, 
and pulsing advertising strategies are recovered for resource allocation. 

Keywords: Stochastic optimal control models; Online stores; Hamilton-Jacobi- 
Bellmann equation. 

1 Introduction 

Auctions are a natural way to assign resources to selfish agents who compete for 
resources. These include diverse examples such as: wireless spectrum, arts, elec- 
tronic equipment, songs, edges of a publicly known network iH|27l, or some other 
digital goods Q3J. Please refer to the monograph by Milgrom |[23l for an overview 
of auction theory. Online auctions have been established as electronic mechanisms 
in commercial transactions. Over a very short time, the Internet has become a 
formidable tool to connect buyers and sellers separated by large physical distances. 

In this electronic arena, online sales have increased in orders of magnitude be- 
yond imagination. Online auctions, as a tool for commercial transactions, have 
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been established and used for maximization of auctioneer's profit fTl l6l [T2l [T8l l34l 
[T9j [T3l El HH - On the other hand, the growth of the online commercial environment 
has come at the price of anonymous transactions where buyers have less recourse 
in protesting bad or unfair service. Online auctions are substantially different than 
traditional offline auctions [|9j [33] [24] [21], and apropo that fact their bidders be- 
have differently ||4). The presence of feedback forums, such as Amazon, eBay, 
and Yahoo!, at online auction sites, provide possibilities to rank other agents, get 
informed, and be ranked. Naturally, the online comment form has become vital in 
choosing which seller to purchase from. If the seller's reputation is damaged via 
negative feedback, she may expect a smaller sales rate. Reputation should simply 
mean that the higher the reputation of an agent is, the more confident that agent is. 
The notion of reputation helps agents, or the system designer, to optimally make 
their decisions, or to optimally design the mechanism, respectively lfT6*l[T0l . The 
notion that reputation affects the market outcome in the online settings has been 
studied (3] |3T] |30l, and there has been study on the optimal bidding strategy in 
sequential auctions |2). 

In an online store, if the agent has profited greatly from dishonest transactions, 
or even decreased spending of resources on (a) advertising or (b) following up 
on sales to encourage higher reputation scores, this immediate increase in wealth 
may offset the long-term damage from lower reputation. With this in mind, one 
considers the existence of an optimal long term strategy. Using such a strategy, the 
seller can maximize her average expected wealth at a time of her choosing. Our 
work approximates the actions of a seller as being continuous. Such an approxima- 
tion serves as an entry point and has the benefit of closed form solutions in some 
cases and the power of functional and numerical analysis in others. The results we 
present are encouraging and intuitively satisfying, and suggest more study in both 
discrete and continuous settings. 

We note that our version of reputation evolution follows the standard Nerlove- 
Arrow construction [26], first postulated in 1962, and then extended to a stochastic 
setting by Sethi |[32l . and others. In Section [2] we propose an optimal finite hori- 
zon strategy, based on certain modeling assumptions for a seller to maximize her 
final average wealth. In this case, we model reputation, or goodwill, as a geometric 
Brownian motion. This process is of course familiar to those who work in quantita- 
tive finance, and the reader can see that it does indeed fall within a Nerlove- Arrow 
construction ll26l . if the adapted control, the advertising rate, is taken to be propor- 
tional to reputation level. The resulting control is of switching, or pulsing, type and 
is familiar to those who work in optimal advertising strategies. In Section [3] i.e., 
in our second model, we propose a finite horizon strategy that is based on the em- 
pirical work of Mink and Seifert [25], and we use for our reputation evolution the 
model proposed by Rao [29] and used by many others in optimal advertising mod- 
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els, such as Raman's recent work in Boundary Value Problems in the Stochastic 
Optimal Control of Advertising [28]. 



2 Implicit Resource Allocation Mechanism: Finite Time 
Horizon 

For the probability space (17, 3, P), define B as the standard Brownian motion that 
lives on this space. Take the current state (W, R, p) as the Wealth, Reputation, 
and measure of resources spent on advertising or promotion, respectively. In this 
scheme, R is a positive number reflecting customer satisfaction. By choosing to 
shift up to lOOe percent, e < 1, of her resources from promotion to processing 
and back, the seller can influence her wealth and reputation levels via p. Positive 
p corresponds to a promotional state; p < corresponds to a processing state. 
The evolution of wealth and reputation is modeled over an interval [t,T], where 
< t < T, by the two-state Markov process: 



dW t 
dR t 

p € A e 



G(R t ,p t )dt, 
Rtifitdt + adBt) , 



pe 3* 



sup 

t<s<T 



|< e 



The general model that we propose is to solve for u(W, R, 0), where p > is the 
discount rate and 



u(W,R,0) :-- 

u(W,R,t) := 
v(R,t) :-- 



sup E [W T I W = W, Ro = R] , 

rT 

W + sup E ' 

W + e pt v{R,t), 



[ e-P s G(R s ,p s )ds\R t = R 
J o 



sup E 



j e- ps G(R s ,p s )ds\R t = R 



This control problem corresponds to a store with (a) no salvage value upon 
closing or transfer, see ||28l , and (b) no switching costs proportional to p? as 
we have the a prior bound \p\ < e <g 1. Our approximation is to assume a 
large inventory with an almost constant number of units sold per unit time. Un- 
der this assumption, we consider the growth rate to be the product of two factors: 
G(R, p) = (1 — p)h(R). This product, revenue per unit time, is the price per unit 
h{R) multiplied by a resource rate factor r(p) = 1 — p, the number of units sold 
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processed per unit time. Such a factor r < 1 (jj, > 0) represents the diversion of 
resources to advertising in return for a larger reputation score and larger potential 
sales price per unit in the future. If the seller ignores her advertising duties and 
diverts resources to processing, i.e. if \i < 0, then r > 1 and we have an imme- 
diate revenue increase in return for lower reputation score and lower future sales 
price. It is implicitly assumed that this immediate loss (gain) can be attributed to 
spending (absorbing) an extra 100/x percent of time/resources on raising reputation 
scores (processing.) A balance is sought between these two competing factors h 
and r. 

Finally, we model the evolution of the seller's reputation as a Geometric Brow- 
nian motion. This is done to reflect the asset-like nature of goodwill, and is only 
one of many approaches. In the next section, we propose a related control prob- 
lem that uses the stochastic Nerlove- Arrow model [26 1 . This model has been used 
with success in the analysis of optimally planning advertising campaigns, and the 
resulting control is a pulsing, or switching, strategy that is similar to the one found 
using our Geometric Brownian dynamics for goodwill (reputation). Of course, the 
reader might notice that the Geometric Brownian motion model for the controlled 
reputation mechanism is also a Nerlove- Arrow type dynamic [26], if the control 
is taken to be proportional to the current level of goodwill, and if the stochastic - 
ity added also has a diffusion term a(R) that is linearly proportional to reputation 
R, instead of just constant. With these modeling constraints, the corresponding 
optimal control problem becomes 

i-T 



v(R,t) 



sup E 



e- ps (l-^ s )h(R s )ds\R t = R 



Then, the corresponding Hamilton-Jacobi-Bellmann (HJB) equation is 



max 

-e<fi<e 



dv 



di 



d 2 v 



di + » R dR + T R dR? 



+ e- p \l- (j,)h(R) 
v(R,T) 



0, 
0. 



Basic algebra now leads to the following set of equations 

dv 



nt, / t~i\ dv a 2 n d 2 v 



R 



dR 



e~ pt h{R) 
v(R,T) 



0, 
0. 



(1) 



For general functions h(-), we are committed to the theory of partial differential 
equations. In the case where h(-) takes on a special form, we obtain simple, closed 
form results, via separation of variables. For example, we consider a power law 
model in the next section. 
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2.1 Power Law Model for the Growth Rate 



Consider the growth rate G(R, fi) = (1 — n)W '. This case corresponds to an object 
that has no inherent value if the sellers reputation is zero. As h(0) = 0, we expect 
that lim#_j.o v(R, t) = 0. We can in fact show this via the following lemma. 

Lemma 1. For G(R, /i) = (1 — /i)i? 7 , we have lim^o v(R, t) = 0. 

Proof. Since \i < e, standard SDE theory [17] implies that for all t < s < T, we 
have P[Rg < R s ] = 1, where dR s = R s (eds + adW s ) and R t = R. Hence 



< v(R,t) < / e~ ps (l + e)E 



T 



Rj I Rt — R 



ds 



r 1 

< {l + e)e- pt R^ J 



ds 



R^O 



0; 



which proves the assertion of the lemma. This can also serve as an upper bound 

for v(R,t). □ 

As it is sometimes possible with optimal control problems in finance and oper- 
ations research, we assume an Ansatz for the solution of ([I]). Then, separation of 
variables leads to this solution-control candidate pair of the form 

v(R,t) = e - pt i) liP {t)R? 

fit = e-sgn (jip ltP (t) - 1). 

We now prove the following theorem. 

Theorem 2. Let 

r r-T 



J e~ ps (l- fi s )R]ds | R t 
Then v(R,t) = e- pt 7p 7:P (t)R" ( , where 



v(R, t) = sup E 
p,eA c 



R 



V>7,p( T ) 



0, 

0. (2) 



Proof. For ease of computation, consider the case 7 = l,p = 0. (The case 7 < 
1, p > follows a similar computation.) Then basic algebra leads to 



dv 2 

R+ di + Y R 



d 2 v 

aR2 



+ eR 



dv 
v(R,T) 



0, 
0. 
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The corresponding control policy is fj, s = e ■ sgn (-§^(R S , s) — l). The ordinary 
differential equation © reduces to 

l + ^i,o(*) + *,o(i)-l| = 0, 
MT) = 0. 

Although this is a nonlinear boundary value problem, it is still of first order, and 
separation of variables leads to ^1,0/ d£ = 1 + e |V>i,o — 1|- So, for some constant 
C\, the following is satisfied 



2et + Ci 



21n(l + e(l-Vi,o)), if ^1,0 < 1 
-21n(l-e(l-Vi,o)), if ^1,0 > 1 



Using the boundary condition ^1,0 (^) = 0, we obtain 

r i±£ (1 _ e -<*r-t)) ? if r-|ln(l + e) <t <T 
' I e( i + ( e) j > if 0<t<T-iln(l + e ) 

and /j, t = e ■ sgn (|% - l) = e ■ sgn (V>i,o(*) - 1) implies 

_f e, if < t < T- ±ln(l + e) 
M * ~ { -e, if T - i In (1 +\) <t<T. 

Notice that ^ 7 , P € C^oo) and that G C°°(0,oo). It is straightforward to 
see that ipy^tyR? is in the standard class 2? for verification (see lTT4l ). and is thus 
a solution to the optimal control problem. In the case 7 = 1, if the time horizon T 
satisfies e _1 ln(l + e) < T < 1 + e _1 ln(l + e), then there is indeed a switching 
time. Otherwise, the optimal policy is either /x = e or fi = — e, depending on the 
size of the time horizon T. Again, a similar region for T is expected if 7 < 1. □ 

Interestingly, this control is independent of the reputation level R, and depends 
only on the time remaining. Also, the reader may notice a similarity of our ap- 
proach to the iconic Merton Portfolio problem lf22~l . In our model, we multiply 
the power law growth rate by the processing rate dependent on the control, and we 
bound the size of our control. Still, it is not surprising that the solution is obtained 
by the same method, separation of variables, and has the same power law depen- 
dence on the underlying asset, in our case reputation. Unfortunately, the general 
h(R) case does not express the same solution form. 

Consider now 

IM = el(o < t < T- -ln(l + e)) - el(T - -ln(l + e) < t < T 
dRt = Rt (fit dt + adB t ) . 



6 



For the sake of completion, we now verify for T — | In (1 + e) < t <T that 



ip 1 (t)R = E 



J (I- n s )R s ds\ Rt = R 



In fact, direct substitution leads to 



E 



J (1 - n a )R B | Rt = Rds 



J E [(1 - Ms )i? s | 7^ = R] ds 
J T E[(l-(-e))R s \R t = R] ds 

f T-t 

(1 + e) / E [R s \Rq = R] ds 
Jo 

(1 + e) y i?e~ es ds 
iii±lfl- e -Cr-*) 



which is the same as the corresponding quantity in (HJ. A similar computation can 
be made for < t < T — In (1 + e)/e to verify that the solution of the partial 
differential equation coincides with the expectation of the stochastic integral. Also 
note that as e — > 0, we have lim e _>.o v(R, t) = (T — t)R. This is expected, as 
fH^O and so E [R s | R t = R] -> R. 

The study of bang-bang optimal control problems has a deep history, along 
with many approaches. A very useful and well studied path can be found in the 
work of O, among others. This method solves for the sign of the gradient of 
value function, Vv, using a Girsanov transformation [5 1, and the general solution 
follows. For a more detailed study consult GUI . 

Also, the reader may notice a similarity of our approach to the iconic Merton 
Portfolio problem [22]. In our model, we multiply the power law growth rate 
by the processing rate dependent on the control, and we bound the size of our 
control. Still, it should not be too surprising that the solution is obtained by the 
same method, separation of variables, and has the same power law dependence on 
the underlying asset, in our case reputation. Unfortunately, the general h(R) case 
does not express the same solution form, whether we use a geometric Brownian 
motion or stochastic Nerlove- Arrow dynamics ll26l . see Section [3] 

For sake of comparison, one could also analyze this problem as a single vari- 
able problem; the decision of when to make a single switch. For completeness, we 
present this in Appendix and show that solution via this method is the same as the 
HJB approach. 
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3 Explicit Resource Allocation Mechanism: Empirical Model 
of Mink and Seifert 



In the previous model, we assumed an implicit mechanism for generosity. How- 
ever, an explicit mechanism has been proposed in the novel work of l|25l . There, 
the authors propose and empirically justify a growth rate of 



where A relates to the inherent value of the object for sale and C is a parameter to 
be fitted. This is accomplished by obtaining data using an auction robot and then 
computing a single regression, which gives C = 2.50 in ((3]). It should be noted 
that in many economic papers on the effect of reputation on bidding and final sales 
price, the Mink-Seifert model is the first one we found to give an explicit relation- 
ship between reputation and price. The Mink-Seifert model also suggests a mul- 
tiple regression formula where other factors, such as shipping costs and whether 
a "buy-it-now" price is offered, are considered as well. In that case, C = 1.93 
and the authors in |[25l comment this implies ". . . the correlation between a seller's 
revenues and her feedback score can be attributed to a large part to the fact that 
highly experienced sellers both have a higher feedback score and design the auc- 
tion more favorably" . In fact, they show that the coefficient attributed to shipping 
costs is larger than one, implying that customers put a high value on shipping when 
deciding on their bids, and that savvy agents take this into consideration. Finally, 
they posit that the horizon T does not affect the revenue stream as much as the 
shipping cost and reputation factors, and so we consider an arbitrary finite horizon 
model here. 

As an initial approach, we consider only the effects of reputation, and leave 
the more general model with shipping costs for future work. Also, we consider the 
reputation mechanism first suggested by the work of Nerlove and Arrow ll26l and 
later generalized to stochastic settings by, for instance, |[32ll29ll28l . The stochastic 
wealth growth rate is given by 



where k is a proportionality constant first proposed in ll26l . and 7 represents the 
maximum premium earned over the inherent value A by the seller due to reputation. 
Given our initial approach in the previous section, where we took p(p) = 1 — fi, 




(3) 



dW t 
dR t 



G(Rt,p t )dt 

(fit - KRf) dt + adB t 
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for fj, G [— e, e], we can now extend the processing rate p(/j,) given expenditure. For 
a finite horizon, a company may be interested in capping its expenditure rate, and 
may only siphon a small amount away from the advertising budget as well in order 
to allocate that resource to processing instead. In general, we can have a rate p(^t) 
and a maximum possible processing rate M that satisfy 



lim pip) 
lim p(p) 

fj,— >+oo 

lim p(p) 



1, 

o, 

M. 



Our previous control problem (|2) returned a switching, or pulsing, type adver- 
tising campaign that was only time dependent. This made the subsequent analysis 
to verify the HJB solution easier. With the stochastic Nerlove-Arrow [26 1 dynamics 
for reputation and Mink-Seifert [25 1 growth rate for sales, we expect that switching 
would depend on both reputation and time. To investigate this, we again consider: 
(i) p(n) = 1 — /x, and (ii) a seller who wishes to cap her advertising expenditure, 
but now as a total rate that is not necessarily proportional to her current reputation 
level 



V{R,t)= sup E 



-e</i<e 



TAr 



(1 



Us 



A + 7 1 



1 



In (e + R s 



ds\R f = R 



where t = inf {t > | R t = 0}. 

The corresponding nonlinear HJB boundary value problem is 



max 

-€<fj,<e 



dV 
~dt 



n .8V a 2 d 2 V f1 



H) [A + 7 1 



1 



In (e + R) 

v(R,T) = 0. 
v{0,t) = 0. 



The candidate for optimal control is again a pulsing strategy, switching between 
the values e and — e is given by 



H = e • sgn 



dv 
~dR 



A + j[ 1 



1 



In (e + R) 
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The resulting boundary value problem becomes 

1 



.4 + 7 1 



In (e + R) 
dv 
dR 



dV dV a 2 d 2 V 

K E> I 

dt or 2 or 2 



+e 



o75- U + 7 1 



1 



In (e + R) 

v(R,T) 
v(0,t) 



0. 
0. 



Naturally, the strategy will switch dependent on both the time and current rep- 
utation state. This is different from the power law model as that led a time-only 
dependent pulsing strategy. Of course, the solution of the HJB equation is only a 
candidate for the solution of the control problem, and it must be verified as in the 
previous section. The solution of this HJB equation, however, may very well be 
non-smooth, and so the notion of generalized solutions (see HH) must be visited. 
Further analysis on this equation is forthcoming. 



4 Conclusion and Future Work 

In this work, we defined a general framework for the problem of selling goods 
online when buyer feedback factors into the sales rate. The framework was a 
stochastic optimal control problem, where the seller wishes to maximize her aver- 
age wealth level at a fixed time of her choice. We presented a method to incorporate 
the optimal design of an online auction (online store) by a seller in the presence 
of reputation management. Then, we introduced and analyzed the corresponding 
Hamilton- Jacobi-Bellmann equation. To obtain intuition, we first analyzed a model 
where the revenue per sale, or the price per unit, was dependent solely on reputa- 
tion, and multiplied by a mark-down factor. Such a factor represented the loss per 
item the seller could expect for behaving generously today in return for a larger 
reputation score and larger potential revenue in the future. As an example, we con- 
sidered a power law growth rate function, which again provided insight into the 
general solution method. 

Subsequently, we have shifted our attention to an empirically justified model, 
and proposed that a seller might optimally design her online store in accordance 
with the Mink-Seifert model, an explicit mechanism proposed in the novel work of 
l25ll . There, the authors propose and empirically justify a price per unit of h(R) = 
A+C(l— 1/ In (e + R)), where A relates to the inherent value of the object for sale 
and C is a parameter to be fitted that represents the maximum premium expected 
due to reputation over inherent value. This is accomplished by obtaining data using 
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an auction robot and then computing a single regression, which gives C = 2.50. 
It should be noted that in many economic papers on the effect of reputation on 
bidding and final sales price, the Mink-Seifert model is the first one we found to 
give an explicit relationship between reputation and price. 

We have incorporated the reputation mechanism first suggested by the work 
of Nerlove and Arrow [26 1 and later generalized to stochastic settings in ll32l 
l29l l28l . With these modeling considerations, the bivariate Revenue-Reputation 
Markov process. Naturally, the strategy is expected to switch dependent on both 
the time and the current reputation state. This is different from the power law 
model, which led to a time-only dependent pulsing strategy. Further analysis on 
the resulting HJB, including numerics, are forthcoming. 
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Appendix 



Here we consider the 1-switching time problem. More precisely, consider dRt 
Rt (nt dt + adBt), with known R s and constant ji. Then it easily follows 



E[R t \R S =R}= Re^ l - s) . 



(4) 



Now, consider 
Hence, 

f(R,t) 



e, < r < t 
-e, t < t < 1. 



E 
E 

(1 - e)E 



(1 — n s )R s ds | i?o = R 



(1 - e)R s ds\Ro = R 
t 

R s ds\R = R 



+ E 



(l + e)R s ds | R = R 



R s ds\R = R 



+ (1 + e)E 

= (1 - e) / E[R S | R = R] ds + (1 + e) f E[R S \ R = R] ds(5) 
Jo Jt 

where the last equality is obtained by the tower property of the conditional expec- 
tations. Furthermore, by dD, the expression © equals: 

f(R,t) = (1-e) I Re es ds + (l + e) [ E[R t e-< s -^ \ R = R] ds 
Jo Jt 

= (1 - e)R— — + (1 + e) I e-< s - l) Re et ds 
e Jt 



R 



L_l( e tt_ 1) + l±±( e et_ e 2et^ 



) ■ 



For f(R, t) defined in ([5]>, we now seek for the absolute maximum on [0, 1]. Equat- 
ing its first derivative with zero, f t {R,t\) = 0, gives t\ = 1 — ln(l + e)/e and 
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f(R, t{) = R(e € + e 2 — l)/(e(l + e)). Collecting these into one computation, we 
obtain 

e e + e 2 - 1 
UR,h) = R £(1 + e) 

e 

Now, simple analysis shows that /(i?, 1) < /(i?, 0) < /(i?, ii), for all e G (0, 1]. 
This is, of course, the result obtained in the HJB analysis in Section lZTl for h(R) = 
R and T = 1. In other words, for h(R) = R, 

v(R,0) = sup f(R,s). 

0<s<l 
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